Brief Overview 1

Column

In this session, we will use the Black Friday Data available in [Kaggle] (https://www.kaggle.com/datasets/pranavuikey/black-friday-sales-eda) to study how to make the following graphical analysis

Column

Graphical Displays

  • Categorical Data
    • Bar Chart
    • Pie Chart
  • Quantitative Data
    • Histogram
    • Boxplot
    • Scatterplot
    • Line

Common Arguments

Here is a list of common arguments - col: a vector of colors - main: title for the plot - xlim or ylim: limits for the x and y axis - xlab or ylab: a label for the x or y axis - font: font used for text, 1=plain, 2=bold, 3=italic, 4=bold italic - font.axis: font used for axis - cex.axis: font size for x and y axes - font.lab: font for x and y labels - cex.lab: font size for x and y labels

Brief Overview 2

Row

In this session, we will use the Black Friday Data available in [Kaggle] (https://www.kaggle.com/datasets/pranavuikey/black-friday-sales-eda) to study how to make the following graphical displays

Row

Graphical Displays

  • Categorical Data -Bar Chart -Pie Chart
  • Quantitative Data
    • Histogram
    • Boxplot
    • Scatterplot
    • Line

Data

Column

First 500 Observations

Column

Description

In order to understand the customer purchases behavior against various products of different categories, the retail company “ABC Private Limited”, in United Kingdom, shared purchase summary of various customers for selected high volume products from last month. The data contain the following variables.

  • User_ID: User ID
  • Product_ID: Product ID
  • Gender: Sex of User
  • Age: Age in Bins
  • Occupation: Occupation (Masked)
  • City_Category: Category of The City (A,B,C)
  • Stay_In_Current_City_Years: Number of years stay in current city
  • Marital_Status: Marital Status
  • Product_Category_1: Product Category (Masked)
  • Product_Category_2: Product may belong to other category also (Masked)
  • Product_Category_3: Product may belong to other category also (Masked)
  • Purchase: Purchase Amount
Rows: 550,068
Columns: 12
$ User_ID                    <dbl> 1000001, 1000001, 1000001, 1000001, 1000002…
$ Product_ID                 <chr> "P00069042", "P00248942", "P00087842", "P00…
$ Gender                     <chr> "F", "F", "F", "F", "M", "M", "M", "M", "M"…
$ Age                        <chr> "0-17", "0-17", "0-17", "0-17", "55+", "26-…
$ Occupation                 <dbl> 10, 10, 10, 10, 16, 15, 7, 7, 7, 20, 20, 20…
$ City_Category              <chr> "A", "A", "A", "A", "C", "A", "B", "B", "B"…
$ Stay_In_Current_City_Years <chr> "2", "2", "2", "2", "4+", "3", "2", "2", "2…
$ Marital_Status             <dbl> 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0…
$ Product_Category_1         <dbl> 3, 1, 12, 12, 8, 1, 1, 1, 1, 8, 5, 8, 8, 1,…
$ Product_Category_2         <dbl> NA, 6, NA, 14, NA, 2, 8, 15, 16, NA, 11, NA…
$ Product_Category_3         <dbl> NA, 14, NA, NA, NA, NA, 17, NA, NA, NA, NA,…
$ Purchase                   <dbl> 8370, 15200, 1422, 1057, 7969, 15227, 19215…

Bar Chart

Row

Bar Chart is a graphical display suitable for the general audience. We will study the distribution of Age Group of the companys’s customers who purchased their products on Black Friday

Usage: barplot(height,…)

<spanStyle=“color:orange”>col

Analysis

The data is slightly right skewed but peaks at age group 26-35, whose data is almost double the amount of the other groups.

Row

Vertical Bar Chart

Horizontal Bar Chart

Pie Chart

Column

Similarly, we can use pie charts to study teh distribution of the city category. Usage: pie(height,…)

Analysis

The distribution of the cities is pretty evenly spread. There is no group that has significantly less or more than its counterparts.

Column

Distribution of City Category

Histogram

Column

Histogram is used when we want to study the distribution of a quantitative variable. Here we study the distribution of customer purchase amount. Usage: hist(x, …)

Column

Analysis

The data appears to be right skewed but there are still some peaks along the right-hand of the graph. I would say that this data representation isn’t the best to observe because of these outliers.

Boxplot

Column

Boxplot 1

Boxplot 2

In general, a boxplot is used when we want to compare the distribution of several quantitative variables. In the following we study the distribution of customer purchase amount among different age groups.

Column

Analysis

Looking at the first boxplot, there seems to be a lot of outliers. In the second set of boxplots, the data between Male, Female, Single, and Married seems to be pretty similar. They all have a lot of outliers so this data representation may not be the best when analyzing this data.

Line Plot

Column

Data

Since the Black Friday are not time series data, it is not appropriate to use a line plot. In the following code chunk, we create a data frame using the forecasted highest temperatures from July 13 to July 22 in 2022 ([The Weather Channel] (https://weather.com)).

Analysis

This graph shows an interesting representation of temperatures in specific cities in four states. Because these states differ in location, there is no correlation between these variables.

Column

Line Chart

---
title: "Basic Graphical Displays"
output: 
  flexdashboard::flex_dashboard:
    theme:
      version: 4
      bootswatch: minty
      navbar-bg: "purple"
    orientation: columns
    vertical_layout: fill
    source_code: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(DT)
library(plotly)
Friday<-read_csv("Black_Friday.csv")
```
Brief Overview 1
===

Column {data-width=450}
---

In this session, we will use the Black Friday Data available in [Kaggle] (https://www.kaggle.com/datasets/pranavuikey/black-friday-sales-eda) to study how to make the following graphical analysis 

Column {.tabset data-width=550}
---

### Graphical Displays
- Categorical Data
  - Bar Chart
  - Pie Chart
- Quantitative Data 
  - Histogram
  - Boxplot
  - Scatterplot
  - Line

### Common Arguments
Here is a list of common arguments 
- col: a vector of colors
- main: title for the plot
- xlim or ylim: limits for the x and y axis
- xlab or ylab: a label for the x or y axis
- font: font used for text, 1=plain, 2=bold, 3=italic, 4=bold italic 
- font.axis: font used for axis
- cex.axis: font size for x and y axes
- font.lab: font for x and y labels
- cex.lab: font size for x and y labels

Brief Overview 2 {data-orientation=rows}
===
Row {data-height=100}
---
In this session, we will use the Black Friday Data available in [Kaggle] (https://www.kaggle.com/datasets/pranavuikey/black-friday-sales-eda) to study how to make the following graphical displays

Row {data-heigh=900}
---
### Graphical Displays 
- Categorical Data
  -Bar Chart
  -Pie Chart 
- Quantitative Data
  - Histogram
  - Boxplot
  - Scatterplot
  - Line
  
Data
===
Column {data-width=550}
---
### <b><font size = 4><span Style = "color:blue">First 500 Observations</span></font></b>


```{r show_table}
datatable(Friday[1:500,],rownames=FALSE,colnames=c("User ID","Product ID","Gender","Age","Occupation","City Category","Stay In Current City Years","Marital Status","Product Category 1","Product Category 2","Product Category 3","Purchase"),options=list(pageLength=20))
```

Column {data-width=450}
---
### <font size = 4><span Style = "color:red">Description</span></font>

In order to understand the customer purchases behavior against various products of different categories, the retail company "ABC Private Limited", in United Kingdom, shared purchase summary of various customers for selected high volume products from last month. The data contain the following variables.

- User_ID: User ID
- Product_ID: Product ID 
- Gender: Sex of User
- Age: Age in Bins
- Occupation: Occupation (Masked)
- City_Category: Category of The City (A,B,C)
- Stay_In_Current_City_Years: Number of years stay in current city
- Marital_Status: Marital Status
- Product_Category_1: Product Category (Masked)
- Product_Category_2: Product may belong to other category also (Masked)
- Product_Category_3: Product may belong to other category also (Masked)
- Purchase: Purchase Amount
```{r}
glimpse(Friday)
```
Bar Chart {data-orientation=rows}
===

Row {data-height=350}
---

### Bar Chart is a graphical display suitable for the general audience. We will study the distribution of Age Group of the companys's customers who purchased their products on Black Friday
**Usage:** barplot(height,...)

<spanStyle="color:orange">col</span> 

### Analysis

The data is slightly right skewed but peaks at age group 26-35, whose data is almost double the amount of the other groups. 

Row {data-height=650}
---

### **Vertical Bar Chart**
```{r bar1}
par(mgp=c(4,1,0)) 
par(mar=c(5,7,4,2))
barplot(table(Friday$Age),col="lightblue",main="Distribution of Purchases by Customer's Age",ylab="Number of Purchases",xlab="Age Group")
```

### **Horizontal Bar Chart**
```{r bar2}
par(mgp=c(4,1,0)) 
par(mar=c(5,7,4,2))
Friday%>%
  ggplot(aes(x=Age))+
  geom_bar(fill="#69f")+
  coord_flip()+
  labs(title="Distribution of Purchases by Customer's Age",x="Age Groups",y="Number of Purchases")->bar1
ggplotly(bar1)
```
Pie Chart 
===

Column {data-width=500}
---

Similarly, we can use pie charts to study teh distribution of the city category.
**Usage:** pie(height,...)

### Analysis 

The distribution of the cities is pretty evenly spread. There is no group that has significantly less or more than its counterparts. 

Column {data-width=500}
---

### Distribution of City Category
```{r pie}
H<-table(Friday$City_Category)
percent<-round(100*H/sum(H),1)
pie_labels<-paste(percent,"%",sep="")   
pie(H,main="Distribution of City Category",labels=pie_labels,col=c("#54d","#f67","#f8f"))
legend("topright",c("A","B","C"),cex=0.8,fill=c("#54d","#f67","#f8f"))
```

Histogram
===

Column {data-width=500}
---

### 
Histogram is used when we want to study the distribution of a quantitative variable. Here we study the distribution of customer purchase amount.
**Usage:** hist(x, ...)

```{r histogram}
Friday %>% ggplot(aes(x=Purchase))+
  geom_histogram(fill="blue")+
  labs(title="Distribution of Customer Purchase Amount",
       x="Purchase Amount (British Pounds")
```

Column {data-width=500}
---

### Analysis 

The data appears to be right skewed but there are still some peaks along the right-hand of the graph. I would say that this data representation isn't the best to observe because of these outliers. 

Boxplot
===

Column {.tabset data-width=500}
---

### Boxplot 1
```{r boxplot1}
boxplot(Friday$Purchase,xlab="Purchase Amount",ylab="British Pounds")
```

### Boxplot 2

In general, a boxplot is used when we want to compare the distribution of several quantitative variables. In the following we study the distribution of customer purchase amount among different age groups. 

```{r boxplot2}
boxplot(Purchase~Gender+Marital_Status,data=Friday,main="Distribution of Purchase by Sex and Marital Status",xlab="Sex and Marital Status",ylab="Purchase",cex.lab=0.75,cex.axis=0.5,
        names=c("Female & Single","Male & Single","Female & Married","Male & Married"))
```

Column {data-width=500}
---

### Analysis

Looking at the first boxplot, there seems to be a lot of outliers. In the second set of boxplots, the data between Male, Female, Single, and Married seems to be pretty similar. They all have a lot of outliers so this data representation may not be the best when analyzing this data. 

Line Plot 
===

Column {.tabset data-width=350}
---

### Data 
Since the Black Friday are not time series data, it is not appropriate to use a line plot. In the following code chunk, we create a data frame using the forecasted highest temperatures from July 13 to July 22 in 2022 ([The Weather Channel] (https://weather.com)).

```{r data}
Date<- 13:22
Dayton_OH<- c(84,86,91,89,89,91,92,91,91,91)
Houston_TX<- c(100,97,96,94,94,94,93,93,92,91)
Denver_CO<- c(95,95,89,96,97,96,92,91,95,96)
Fargo_ND<- c(86,80,84,97,90,87,83,84,87,89)
df<- data.frame(Date,Dayton_OH,Houston_TX,Denver_CO,Fargo_ND)
datatable(df,rownames=FALSE,colnames=c("Date","Dayton, OH","Houston, TX","Denver, CO","Fargo, ND"))
```

### Analysis

This graph shows an interesting representation of temperatures in specific cities in four states. Because these states differ in location, there is no correlation between these variables. 

Column {data-width=650}
---

### Line Chart 

```{r line1}
plot(Date,Dayton_OH,type="o",col="lightblue",xlab="Date in July",ylab="Highest Temperature",ylim=c(80,100))
lines(Date,Houston_TX, type="o",col="lightpink")
lines(Date,Denver_CO,type="o",col="lavender")
lines(Date,Fargo_ND,type="o",col="lightgreen")
legend("topright",legend=c("Dayton, OH","Houston, TX","Denver, CO","Fargo, ND"),col=c("lightblue","lightpink","lavender","lightgreen"),lty=1,pch=1)
```